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Abstract 

Financial exchanges provide incentives for limit order book (LOB) liquidity provision to certain 
market participants, termed designated market makers or designated sponsors. While quoting 
requirements typically enforce the activity of these participants for a certain portion of the day, we 
argue that liquidity demand throughout the trading day is far from uniformly distributed, and thus 
this liquidity provision may not be calibrated to the demand. We propose that quoting obligations 
also include requirements about the speed of liquidity replenishment, and we recommend use of the 
Threshold Exceedance Duration (TED) for this purpose. We present a comprehensive regression 
modelling approach using GLM and GAMLSS models to relate the TED to the state of the LOB and 
identify the regression structures that are best suited to modelling the TED. Such an approach can 
be used by exchanges to set target levels of liquidity replenishment for designated market makers. 
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1 Introduction 


Financial exchanges have different modes of operation, or market models, for different assets. This 
is often determined by an asset’s liquidity in the prevailing period, and an exchange will endeavour 
to choose a market model that facilitates trading in the asset. As an example, the electronic 
trading system Xetra, operated by Deutsche Borse, offers continuous trading for the most liquid 
assets, and the same mode of operation is offered for the second most liquid category of assets, but 
supplemented with a ‘Designated Sponsor’, who has market-making obligations. Other securities, 
such as structured products, feature a single market maker, while the less liquid assets are traded 
instead in ‘continuous auction’ mode, which features a specialist. 

The classification of assets in most electronic exchanges is performed according to their liquidity 
which is averaged over a particular period of time (typically quarterly). For assets which feature a 
Designated Sponsor (termed a Designated Market Maker in other exchanges), there are requirements 
regarding the maximum spread, minimum quote size and the effective trading time. In return 
for fulfilling their quoting obligations. Designated Sponsors receive a full reimbursement of the 
transaction fees incurred. 

In this paper, we argue that in order to ensure high-frequency liquidity provision, exchanges 
need to consider not only the average liquidity over time, but also the time required for liquidity to 
be replenished, which we will explain and quantify as an indication of liquidity resilience. This is 
because large orders are increasingly being partitioned by execution algorithms into multiple smaller 
tranches, and traders take advantage of liquidity replenishment to improve executioiiQ 

Such replenishment is swift when market liquidity is resilient, and the effect of resilience on e.g. 


optimal execution has been considered in the past in the models of Obizhaeva and Wang 2012 


and Alfonsi et al. 2010 . However, these models generally considered resilience to be constant or 
have a very simple parametric form. Thus, they failed to attribute the resilience characteristics to 
interpretable features of the limit order book structure. 


The model of Panayi et al. 2014 instead introduced a new notion of resilience explicitly mea¬ 
suring the time for liquidity to return to a previously-defined threshold level. This approach was 
agnostic to the particular class of liquidity measure considered, and could therefore accommodate 
volume-based, price-based and cost-of-return-trip-based measures. They showed that resilience was 
not constant, but was instead related to the state of the LOB. This allowed them to understand 
the effect of different LOB structural explanatory variables on the resilience metric constructed, 
and as part of this, they considered a regression based specification. In particular, they considered 
simple log-linear regression structures to relate the response (the duration of liquidity droughts) to 
instantaneous and lagged limit order book structural regressors intra-daily. 

Using Level 2 LOB data from the multi-lateral trading facility Chi-X, we have access to the 
state variables considered by Panayi et al. 2014 , and can therefore consider this notion of resilience 


further in the study undertaken in this manuscript. We significantly extend their resilience modelling 
framework to allow for additional structural features, as well as a greater class of distribution 
model types to better explain and capture the liquidity resilience features of a range of assets intra¬ 
daily. In particular, we consider two classes of regression models which allow for more general 
resilience model dynamics to be captured and more flexible distributional features to be explored, 
ultimately improving the fit and forecast performance of the models. Firstly, we have Generalised 
Linear Models, or GLMs, which typically assume a conditionally specified exponential family of 
distributions for observation assumptions for the response, in our case the exceedance times over a 
liquidity threshold. The second class is that of Generalised Additive Models for Location, Shape 


^[Chlistalla et al.' 2011 notes that the average order size is one-eighth of that of fifteen years ago, in terms of 


number of shares, and one-third in dollar value. 
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and Scale, or GAMLSS, which relax this assumption and can consider a wider, general distribution 
family with the limit order book regressors entering not just into the location/mean relationship 
through a link function, but also into the shape and scale parameters directly. This informs the 
skewness and kurtosis of the liquidity prohles and the resilience of the liquidity in settings of liquidity 
leptokurticity and platokurticity. 

It is critical to develop these new modelling approaches, as they provide a directly interpretable 
modelling framework to inform exchanges and market making participants of the influence different 
structural features of the LOB for a given asset will have on affecting instantaneously within a 
trading day the local liquidity resilience. They thus provide insights into how best to manage and 
design market making activities to improve resilience in markets. Our results reveal that consider¬ 
ing the more flexible Generalised Gamma distribution assumption within a GAMLSS framework, 
with multiple link functions to relate the LOB covariates to the different distribution parameters, 
improves the explanatory performance of the model. On the other hand, the simpler Lognormal 
specihcation also achieves respectable explanatory power and its estimation is more robust. 

We also statistically assess the significance of the explanatory variables in greater detail, and 
across datasets for companies from 2 different countries. We find that, in agreement with empirically 
observed market features, a larger deviation of the liquidity from a given resilience threshold level 
would be associated with a longer deviation from that level of liquidity (liquidity drought). On the 
other hand, a larger frequency of such deviations from a liquidity threshold level would be associated 
with swifter returns to that level (shorter duration liquidity droughts). Using the proposed liquidity 
resilience modelling framework we can also determine the regimes under which we are likely to see 
different structural features in the resilience behaviour. 

Our results indicate that resilience considerations should also be a factor when deciding the quot¬ 
ing requirements for exchange-designated liquidity providers, such as the aforementioned Designated 
Sponsors. That is, along with the requirements for maximum spread and minimum volumes, they 
should be subject to additional requirements for liquidity replenishment, ensuring that throughout 
the trading day, the LOB returns swiftly back to normal levels. As we have shown that liquidity 
resilience is dependent on the state of the LOB, exchanges can use the modelling approaches we 
have proposed, in order to determine the appropriate level of liquidity replenishment requirements, 
given prevailing market conditions. In addition, liquidity providers may use the model to determine 
the best response to a liquidity drought. 

The remainder of this paper is organised as follows: In Sectionwe discuss incentives for liquidity 
provision in the limit order book and other market structures. Section [^introduces existing concepts 
of liquidity resilience, as well as the TED metric analysed in this paper. Section [^ outlines the 
regression model structures of increasing complexity employed in our analysis of liquidity resilience. 
Section describes the data used in this study and section [^ presents the results in terms of 
importance of individual covariates for explaining resilience, and the explanatory power of the 
models with different regression structures and different distributional assumptions for the response. 
Section [^ concludes with proposals about altering current incentive schemes for liquidity provision. 


Ensuring uninterrupted liquidity provision via exchange 
incentives 


In many modern financial markets and across different asset classes, a large part of liquidity provision 
originates from high-frequency traders. Indicatively, for the equities market, a typical estimate of 
at least about 50% of total volume is contributed by such market participants, see details in the 


report by the SEG 2010 . However, these firms have no legal obligations to provide continuous 
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access to liquidity, and may (and indeed dcQ reduce their activity in times of distress. For this 
reason, and in order to also ensure access to liquidity for younger, smaller cap, or more volatile 
stocks, exchanges provide incentives to hrms to facilitate liquidity provision. These market making 
obligations have been found to improve liquidity for these assets, and by extension, also improve 


year-on-year returns Venkataraman and Waisburd, 2007, Menkveld and Wang, 2013 . Benos and 


Wetherilt 2012 summarise the impact of introducing designated market makers into a stock market. 


Both the incentive structure and the obligations differ across exchanges, and in particular, they 
may be applicable only for certain market structures. For example, in London Stock Exchange’s 
hybrid SETS market. Designated Market Makers must maintain an executable quote for at least 
90% of the trading day, as well as participate in the closing auctions, and they are also subject to 
maximum spread and quote size requirements, which vary across stocks. In return, they incur no 
trading fees, and are allowed to ask for the suspension of trading of an asset when prices are volatile 


Benos and Wetherilt 


2012 


As an example of specihc exchange considerations for classifying assets and incentivizing liquidity 
provision we present details for the German electronic trading system Xetra, originally developed 
for the Frankfurt stock exchange. Xetra offers a number of different trading models adapted to 
the needs of its various trading groups, as well as the different assets classes. The models differ 
according tc|^ 


• Market type (e.g. number of trading parties); 

• The transparency level of available information pre- and post-trade; 

• The criteria of the order prioritisation; 

• Price determination rules; 

• The form of order execution. 


For equity trading, the following trading models are 


supportecQ 


• Continuous trading in connection with auctions (e.g. opening and closing auction, and possi¬ 
bly, one or more intra-day auctions); 

• Mini-auction in connection with auctions; 

• One or more auctions at pre-dehned points in time. 


We will focus on the hrst model, which is the market model that reflects the activity considered 
here, i.e. in the context of the LOB. For many of the most well-known assets (such as those in the 
main indices), there is sufficient daily trading interest, such that one should be able to execute their 
orders without much delay and without causing a signihcant price shift (although Xetra also offers 
a price improvement service, termed Xetra BESTj^. However, there are also less frequently traded 

note that during the 2010 ‘Flash crash’, the activity of high frequency traders accounted 
overall activity, compared to the preceeding days. 

■^Xetra trading models, accessed 25/05/2015, available at http://www.xetra.com/xetra-en/trading/ 
trading-models 

'‘Xetra Market Model Equities Release, accessed 25/05/2015, available at http://www. 
dentsche-boerse-cash-market.com/blob/1193332/8b79d504d5aaf80be8853817a6152ecd/data/ 
Xetra-Market-Model-Equities-Release-15.0.pdf 

“Xetra continuous trading with best executor, available at http://www.xetra.com/xetra-en/trading/ 
handelsservices/continuous-trading-with-best-executor 


Kirilenko et al. 2014 


for a much lower share of 
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assets, for which Xetra tries to ensure uninterrupted liquidity provision, by offering incentives to 
trading members to provide quotes throughout the trading day. 

Xetra dehnes liquid equities as those in which the Xetra Liquidity Measure (XLM)|^ does not 
exceed 100 basis points and daily order book turnover is higher that or equal EUR 2.5m on a daily 
average in the preceeding four month periocQ Assets for which this is not true require liquidity 
provision from trading members, termed ‘Designated Sponsors’ for continuous trading to be offered, 
otherwise the assets are traded under a continuous auction model, with a specialist. 

Designated Sponsors have to adhere to strict quoting obligations, which are verified daily. In 
return for meeting them, they have the transaction fees they generate fully reimbursed. These 
quoting requirements depend on the liquidity of the asset in the preceeding 3 month period. Table [T] 
shows both the basis of determination of an asset’s liquidity class, as well as the quoting requirements 
for Designated Sponsors for each of these classes. 


Liquidity class determination 

Liquidity class 

LCl 

LC2 

LC3 

XLM 

< 100 basis points 

< 500 basis points 

< 500 basis points 

General quoting requirements 


LCl 

LC2 

LC3 

Minimum quote size 
Maximum spread 

€20,000 

€15,000 

€10,000 

>EUR 8.00 

2.5 % 

4 % 

5% 

<EUR 8.00 

min {€0.20; 10.00%} 

min (€0.32; 10.00%} 

min (€0.40; 10.00%} 

<EUR 1.00 

€0.10 

€0.10 

€0.10 

Minimum requirements in continuous trading: Quotation duration 


90% 


Table 1: Quoting requirements for Designated Sponsors on the 3 liquidity classes of Xetra (repro¬ 
duced from the Xetra Designated Sponsor guide), accessed 28/05/2015. 


2.1 Limitations of current incentive schemes 


Designated Sponsors can select the time within the trading day for which they wish to be active, 
as long as it exceeds 90% of the day on average. We argue that a more useful quoting requirement 
would also reflect the intra-day trading patterns, i.e. considering also the variation in trading 
activity throughout the trading day. If the 10% of the day for which the Designated Sponsor is not 
active corresponds to a signihcant proportion of daily activity (e.g. close to the beginning and end 
of the trading day), then 90% activity in calendar time does not correspond to 90% in participation 
over the day. 

Indeed, an empirical analysis of intra-day liquidity behaviour shows that liquidity demand 
throughout the trading day is far from uniformly distributed, and thus the quoting requirements 
above may not have the desired effect. Figure [T] shows the proportion of the trading day for which 


the spread for Sky Deutschland on the Chi-X exchange was in the top quintile for the day. Panayi 


®The Xetra Liquidity Measure is a Cost-of-Round-Trip measure, quantifying the cost to buy and immediately sell 

an amount 25000 EUR of an asset 

^Designated sponsor guide, accessed 25/05/2015, available at http: //www.deutsche-boerse 

-cash-market.com/ 

blob/1193330/215d37772fbec9fbc39391cbc7c5821c/data/Designated-Sponsor-Guide.pdf 


























et al. 2014 explained that the presence of larger spreads at the start of the trading day can be 


explained by the uncertainty of market makers about what the fair price should be for the asset, 
while the second concentration can be explained by the release of certain economic announcements. 








8 9 10 11 12 13 14 15 16 


Figure 1: Every line corresponds to a trading day, and the shaded regions represents periods for 
which the spread (left) and XLM (right) is in the top quintile for the day for stock Sky Deutschland. 


This variation in intra-day liquidity demand is prevalent in the equities class across different 
markets and different industries. However, as we have seen in this section, it is not currently 
considered when determining the liquidity class of an asset, and for an extended period of time near 
the start of the trading day or around important economic announcements, we may have extended 
periods of low liquidity. We therefore argue that both the determination of the liquidity class and 
the quoting requirements for Designated Market Makers could be adjusted, in order to reflect both 
the absolute level of liquidit as well as the speed of order replenishment after a shock (e.g. a large 
market order). Specifically, given the remit of Designated Sponsors, one would expect that they 
should not only provide 2-way markets in periods in which there is no trading interest, but also 
swiftly replenish liquidity in more illiquid periods, such as the ones indicated above. 

To determine whether this is the case, we propose the use of resilience in market liquidity, 
measured through the Threshold Exceedance Duration (TED) [Panayi et al., 


2014 , which we 


as 

introduce in the next section. We acknowledge that enforcing swift replenishment of limit orders 
may lead to an increase in adverse selection costs for liquidity providing operators. We therefore 
also present a comprehensive modelling approach to identify the most informative determinants 
of resilience, and thus aid market participants in understanding how their behaviour can affect 
resilience of liquidity in the limit order book. 
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3 Concepts of market liquidity resilience 


3.1 Liquidity resilience introduction 


The seminal paper of Kyle| |1985| acknowledged the difficulty in capturing the liquidity of a hnancial 
market in a single metric, and identihed tightness, depth and resiliency as three main properties 
that characterize the liquidity of a limit order book. Tightness and depth have been mainstays of 
the hnancial literature (and indeed, are easily captured through common liquidity measures such as 
the spread and depth, respectively) and there has been substantial literature in studying the intra¬ 
day variation]^ and commonahtj|^ in these measures. However, resilience has received decidedly less 
attention. 

[Panayi et al.| |2014| provided a review of the state of the art in liquidity resilience and noted that 
the extant dehnitions seemed to be divided into two categories: In the hrst, dehnitions provided by 
Kyle 1985 and Obizhaeva and Wang 2012 were related to price evolution, and specihcally to the 


return of prices to a steady state. The second category of dehnitions, proposed by Garbade 
and Harris 2002 was concerned with liquidity replenishment. 
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Figure 2: An example of the duration of the exceedance over a spread threshold. The spread 
threshold is c is 5 cents, Tj denotes the Tth time instant that the spread exceeds the threshold, and 
Tj is the duration of that exceedance. 


^ Chan et al. 1995 found a declining intra-day spread for NASDAQ securities (an L-shaped pattern), while Wood 
et al. 1985| and Abhyankar et al. 1997 found a U-shape pattern (with larger spreads at the beginning and at the 


end of the day), for the NYSE and LSE respectively. Brockman and Chung [1999| found an inverted U-shaped 
pattern for the depth, which mirrors the U-shaped spread pattern (in that the peak of the depth and the trough of 
the spread both correspond to higher levels of liquidity). 

®There is a rich literature studying the cross-sectional commonality in liquidity in the equity markets through 
the principal components of individual asset liquidity, starting with the work of Hasbrouck and Seppi 2001 and 


extended by Korajczyk and Sadka 2008 and Karolyi et al. 2012 . More recent work by Panayi and Peters 2015 


however, have identified weaknesses in the PCA and PCA regression approaches for quantifying commonality and 
suggests caution when heavy-tailed features are present in liquidity. 
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The liquidity resilience notion introduced in jPanayi et al. 2014 is a member of the latter 


category, and was the hrst to explicitly dehne resilience in terms of any of the possible liquidity 
measures and in terms of a liquidity threshold at which resilience is measured against. Hence, the 
concept of resilience of the liquidity measure was converted to a notion of relative resilience for a 
given operating liquidity threshold that a user may specify. This was important, since as discussed in 


Lipton et al. 2013 , there are several different market participants in modern electronic exchanges 
and their liquidity demands and requirements differ depending on their mode of operation. In 
particular, this will mean that they would likely care about relative liquidity resilience characteristics 
at different liquidity thresholds, which may also depend on the type of liquidity measure being 
considered. All such characteristics are then easily accommodated by the framework developed 

where the central concept is that liquidity is considered ‘replenished’ by 


m 


Panayi et al. 2014 


the market or market maker when a (user-specihed) liquidity measure returns to a (again, user- 
specihed) threshold. In a hnancial market where liquidity is resilient, one would expect that the 
time required for this liquidity replenishment would be low. This replenishment time was captured 
by Panayi et al. 2014 through the idea of the threshold exceedance duration (TED); 


Definition 1. The threshold exceedance duration (TED) is the length of time between the point at 
which a liquidity measure deviates from a threshold liquidity level (in the direction of less liquidity), 
and the point at which it returns to at least that level again. 

Formally, we have 

Ti = inf {r : Lt,+t < c, T* + r > T*} . (1) 


where Lt denotes the level of liquidity at time t and Tj is the i-th time in the trading day where 


liquidity deviates from the threshold level c. This notation is explained in detail in Section 4.4 


This definition was designed to intentionally allow the flexibility for it to be utilised for any 
measure of liquidity of choice, be it price based, volume based or some combination. It also allowed 
for the use of different threshold liquidity levels, and the setting of a very low liquidity threshold 
(e.g. a high level of the spread), which meant that one could model the duration of low liquidity 
regimes, which would be of interest in a regulation setting. In the high-frequency liquidity provision 
setting we are considering in this paper, an exchange would be interested in modelling the time for 
return to a ‘normal’ liquidity level, which one could consider to be the median intra-day liquidity 
level. 


3.2 Regression structures for the modelling of liquidity resilience 

In this paper, we employ a number of regression structures in order to model the TED liquid¬ 
ity resilience metric. Such regression structures provide an interpretable conditional dependence 
specihcation between liquidity resilience, for a desired liquidity measure, given structural observed 
attributes or features of a given asset’s limit order book, or other important market based intra-day 
trading volume/price/activity indicators. In addition, structural features such as known reporting 
times and announcement features can also be incorporated into the model explicitly to see their 
influence on core aspects of the resilience in the liquidity of the asset. 

Although regression models based on simple linear structures linking the mean of a response 
to a linear functional of the covariates have been in widespread use for over 200 years, it is only 
relatively recently that such structures have begun to be signihcantly generalized. Innovations 
in the class of parameteric regression relationships available have included the incorporation of 
non-linear structures, random effects, functional covariates and relationships between not just the 
mean (location of the response variable) and the linear model, but also direct relationships between 
covariates and variance/covariance, skewness, kurtosis, shape, scale and other structure features of 
a number of distributions for the response variable. 



















The beginning of this revolution in regression modelling was heralded by the highly influential 

which was 


family of Generalized Linear Models (GLMs), see for instance Nelder and Baker 1972 


intended to unify the extant regression approaches of the time. Up to this point, only parametric 
mean regression models were considered, i.e. only the mean of the distribution was related to 
the explanatory variables. Shortly after, the variance of the distribution was also modelled as a 
function of explanatory variables, in the case of normal models Harvey, jl976| . Parametric variance 
regression models for different assumptions of the response (e.g. for the exponential family) followed, 
see further discussion regarding the adaptation to different contexts in Rigby et ah 

enabled the modelling of a response 


GAMLSS, introduced by Rigby and Stasinopoulos 2005 


variable from a general parametric distribution. The explanatory variables of the model are then 
related to each distribution parameter through a link function, which can have both linear and non¬ 
linear components. One can explicitly see if the distribution of resilience in liquidity under a given 
market regime, on a given day, is likely to affect the mean, variance, skewness or kurtosis of the 
liquidity resilience. More importantly, which LOB characteristics are most likely to be influential in 
affecting these attributes of the resilience. By identifying these one can then devise market making 
strategies to improve resilience in a given market regime. 

It is clear, therefore, that such a formulation is very general, encompassing previous approaches 
such as GLMs. Indeed, the supporting R package used in this paper [Stasinopoulos and Rigby 


2007 enables the modelling of more than 80 parametric distributions. We can see, therefore. 


that incorporating GAMLSS into our approach would result in a highly desirable and flexible 
framework for modelling TED duration data, and the next section will detail precisely how one can 
define different link function to relate the distribution parameters of the TED response to the LOB 
explanatory variables. 


4 Hierarchy of regression models for liquidity resilience in 
the LOB 

The framework we employ here aims to explain the variation in the TED random variables as a 
function of independent explanatory covariates obtained from the state of the LOB at the point of 
exceedance. We start hrst with a very brief description of log-linear regression structures, in order to 
introduce concepts before moving on to Generalised Linear Models (GLM) and Generalised Additive 
Models for Location Shape and Scale (GAMLSS). 

4.1 Log-linear regression structures 

For the TED random variables, since we are modelling positive random variables, we can consider 
a log-linear formulation, incorporating model covariates as follows 

j3 + Si, ( 2 ) 

where Si ~ A/'(0,(t) is a random error term. Under the assumption that Ti\xi X Tj\xj we can relate 
the expected log response to the covariates 


E[ln(ri)|a;i] = x^jS. 

One can see that a unit change in, say, to + 1 will have a multiplicative effect of 
on the response r*, i.e. the Threshold Exceedance Duration. The sign of the coefficient for a given 
covariate indicates the direction of the partial effect of this variable. 
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4.2 GLM 


While convenient, the log-linear regression structure is also restrictive in the model’s expressive 
power and flexibility, and we may therefore opt to consider instead a GLM. The construction of 
such a parametric regression model requires the specification of three key components: 


1. A distribution for the response random variables. In this context, this is the conditional 
distribution of the TED, given the covariates constructed from the LOB structure. 

2. The conditional mean structure of the TED, which links the linear regression model com¬ 
prised of the independent observed explanatory covariates to the response (typically through 
a transformation known as the link function). 

3. A specification of the variance function, perhaps also as a function of the mean. 


GLMs enable us to fit models when the response variable belongs to the exponential family of 
distributions, i.e. 

f(r\9, 0) = exp + c(d 0)^ (3) 

where 6 is the location parameter and 0 the scale parameter and where a(0), b{d), c(r, 0) are known 
functions defining particular subfamilies. 

A GLM relates the expected response p = E [r] to a linear predictor x'^f3 through a link function 
g{-), i.e. g{fi) = x[f3. When this link function is the identify function, it is equivalent to a standard 
linear model. If g{fj,) = 9 then the function g{-) is called the canonical link and we have 9 = x[f3. 

Let us consider two members of the exponential family, which are widely used for analysing 
duration data, for the distributions of our response. The Gamma distribution 

/(r|a, 13) = exp {-(3t) , r > 0, a > 0,0 > 0, (4) 

r(a) 


which we can see is in the exponential family with 9 = ^, (p = ^, a(0) = — ^ and b{9) = log(0). We 
use the reparameterisation 




1 ^ exp (—r/((T^/i)) 


-1 


(a2/i) 


1/(72 


r(l/a2 


, r > 0, p > 0, (T > 0, 


(5) 


so that E[r] = p. Then ^ and /i = | and we see that the linear predictor x'^f3 would be 

related to both parameters. However, if the shape parameter a is fixed, only the scale parameter (3 
varies with the linear predictor. 

We also consider a second member of the exponential family, namely the Weibull distribution, 
given by 

= exp |-I ,r > 0,a > 0,0 > 0, (6) 

where 0 = p/r(4 + 1). We can see that this is also a member of the exponential family (for fixed 
a), with a(0) = <p = 1, 9 = and b(9) = cr In ju. As in the case of the gamma distribution, if the 
shape parameter is fixed, only the scale parameter varies with the linear predictor. 

In order to ensure that ju is positive, in both cases we use the log link, i.e. g(ju) = log(ju). For 
the exponential family of distributions one can obtain the conditional expectation of the response 
as 
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E [r|a;] = h'{6) = /x 


( 7 ) 


and the variance as 


Var [t\x\ = a{(p)b"{9) = a{(p)V{^) ( 8 ) 

see the derivation in McCnllagh and Nelder |1989| . It is clear then that the formnlation of the 
exponential model above allows one to model cases when the response variables are of the same 
distribntional form and they are independent, bnt not identically distribnted, in that they may have 
different mean and variance. In this case, the variance varies as a fnnction of the mean. 

snggests that there are many fnnctions V (/i) that cannot arise from an 


De Jong et ah 2008 


exponential family distribntion. In addition, one may also want to consider a distribntion which is 
more flexible in terms of skewness and knrtosis. In this case, we propose the flexible generalised 
gamma distribntion class of models given by 


b ^ f 

fr{T;b,a,k) = j J , A; > 0, a > 0, 6 > 0 


(9) 


which was hrst introdnced by Stacy [1962 and considered fnrther in a reparameterised form by 
Lawless 1980 . This distribntion has the additional advantage that it has a closed form expression 


for the qnantile fnnction, which we will see in Section This means we can also explicitly stndy 
the relationship between qnantiles of the TED and certain LOB covariates, i.e. also interpreting the 
resulting regression as a quantile regression Noufaily and Jones, 2013 . As the generalised gamma 


distribution is not a member of the exponential family of distributions, it cannot be modelled using 
a GLM. We will thus show in the next subsection how one can model this using the GAMLSS 
framework. 


4.3 GAMLSS 

GAMLSS requires a parametric distribution assumption for the response variable, but this can be 
of the general distribution family, rather than only in the exponential distribution family, as in the 
case of the GLM. It differs from the GLM additionally, in that where the former assume that only 
the expected response is related to the predictor through a link function, GAMLSS has separate 
link functions relating each of the distribution parameters to the explanatory variables. As such, it 
enables one to capture features such as overdispersion or positive and negative skew in the response 
data. We will base the formulation presented on that of Rigby and Stasinopoulos |2005| . 

Let t' = (ti, ... ,Tn) be the vector of the response and let us assume a density /(rj|0j) where 
Oi = ( 6 * 1 ,i, 6 * 2 , 1 , 6 * 3 , 1 , 6 ^ 4 ^*) = (/ij, (Tj, Ki, z/j), where /ij, cTj, Ki, Ui are the distribution parameters, with the 
two hrst relating to location and scale and others typically relating to shape. Let also Xk be a 
hxed known design matrix containing the covariates at the point of exceedance T for each observed 
TED random variable. In this setting, we can dehne the following link functions to relate the fc-th 
distribution parameter 6k to the vectors of explanatory variables Xk and {ZjkYj’L^ 


’^k 

Qki&k) = Xkf^k + Zjkjjk ( 10 ) 

where {Zjk}^^ are optional components for the incorporation of random effects. 

GAMLSS are referred to as semi-parametric regression type models, as they assume a parametric 
distribution for the response, and they allow for non-parametric smoothing functions in the response. 
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To see this, let Zjk = In, the n x n identity matrix, and let yjk = hjk = hjk{xjk). Then the semi- 
parametric form can be be obtained according to 




Jk 

E 

i=i 


hjkiXjk)- 


( 11 ) 


Hence, nnder the GAMLSS framework, the parameters of the distribution can be modelled 
using linear functions as well as flexible smoothing functions (e.g. cubic splines) of the explanatory 
variables, in addition to random effects. A parametric linear model can be recovered when linear 
functions of explanatory variables are considered, i.e. in the case of the identity link function 


dki^^k') H-kflk- 


( 12 ) 


Using this specihcation, we can consider very flexible multiple-parameter distributions, such as 
the Generalised Gamma distribution (hereafter g.g.d.). We assume that the TED random variables 

are conditionally independent, given the LOB covariates, i.e. r F {T;k,a,b), with the density 
given in Equation]^ The g.g.d. family includes as sub-families several popular parametric models: 
the exponential model {b = k = 1), the Weibull distribution (with k = 1), the Gamma distribution 
(with 6=1) and the Lognormal model as a limiting case (as k —)■ cx)). 

We now wish to relate this statistical model assumption to a set of explanatory variables (co¬ 
variates) from lagged values of the LOB. In practice, to achieve this, one could work on the log 
scale with ln(r), i.e. with the log-generalized gamma distribution, as this parameterisation is known 
to improve identifiability and estimation of parameters. Discussions on this point are provided in 


significant detail in Lawless 1980 . Instead, as we employ the gamlss R package of Stasinopoulos 


and Rigby 2007 for estimation, we have the following reparameterisation 




.r(j)Axp(-.(j)') 


r(»)r 


(13) 


where 6 = This corresponds to the parameterisation in Equation under the transformation 

(14) 

The regression structure we adopt for the g.g.d. model involves a log link for the time-varying 
coefficient 


111 (p(x,)) = ft + ^ 


(15) 


S=1 


with p covariates Xt = measured instantaneously at the point of exceedance t = Tj, and 

the link functions for parameters a and u can be found in Table 6^2 Each of the covariates is a 
transform from the LOB for which the liquidity measure is observed, and all covariates are described 


in Section 4.4.1 We note that we also considered models with interactions between the covariates, 
but interaction terms were not found to be significant in the majority of our models. 

Under a model with this regression structure, we observe that the conditional mean of the 
duration is also related directly to this linear structure where for the Uth exceedance of the threshold, 
we have 


E [Ti\xTi] = exp ( /3o + ^ 


S=1 



r(Ai) 


see details in Lo et ah 2002 . 


(16) 
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4.4 Notation and definitions of LOB variables and regression model 
components 

We define a level of the LOB as a price level at which there is at least 1 resting order: The 1-st 
level of the bid side is then the 1st level below the price midpoint at which there is a resting buy 
order. In addition, we utilise the following notation for a single asset, on a single trading day. 

• a denotes the ask, b denotes the bid. 

• P/*’* G denotes the random variable for the limit price of the level bid at time t in tick 
units 

• P“’* G N’*' denotes the random variable for the limit price of the level ask at time t in tick 
units 

• G N"' denotes a column random vector of orders at the level bid at time t, with n 
being the number of such orders 

• Lt denotes a random variable at time t for the generic proxy for the liquidity measure. 

• c for denotes the exceedance threshold level, dehned relative to the liquidity measure L*. c is 
deterministic and constant over time. 

• Ti denotes the Pth random time instant in a trading day that the liquidity measure exceeds 
the threshold c. Formally, we dehne Tj = inf {t : Lt> c, t> 

'^i—1 , t > To}, where Tq denotes 

the start of the observation window (1 minute after the start of the trading day). 

• Ti will denote the duration of time in ms, relative to the exceedance event Tj, that the liquidity 
measure Lt remains above the threshold c. These are the response random variables which 
correspond to the TED. 

4.4.1 Model LOB Covariates 

For each TED random variable r* we consider the corresponding contemporaneous covariates in our 
regression design matrix, i.e. at the times of exceedance above the specified liquidity threshold, 
t = Ti. In the following, a ‘level’ of the LOB is defined as one in which there is at least 1 resting 
limit order. Thus the hrst 5 levels of the bid are the 5 levels closest to the quote mid-point, where 
there is available volume for trading. The covariates chosen pertain to the state of the limit-order 
book of one given stock. 


• The total number of asks in the first 5 levels of the LOB at time t, obtained according to 
(where I'l is the number of orders at a particular level), and is denoted ask 

hereafter 


The total number of bids in the 


X 


( 2 ) 


= 

l^i=\ 


V, 


b,i 


denoted bid 


hrst 5 levels of the LOB at time t, obtained according to 


• The total ask volume in the hrst 5 levels of the LOB at time t, obtained according to x\ = 

denoted askVolume 

• The total bid volume in the hrst 5 levels of the LOB at time t, obtained according to = 

denoted bidVolume 
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• The number of bids xf''^ in the LOB that had received price or size revisions (and were thus 
cancelled and resubmitted with the same order ID), denoted by bidModified. 

• The number of asks in the LOB that had received price or size revisions, denoted by 
askModified. 

• The average age (in ms) of bids in the hrst 5 levels at time t, denoted by bidAge. 

• The average age x\ ^ of asks in the hrst 5 levels at time t, denoted by askAge. 

• The instantaneous value of the spread at the point at which the i-th exceedance occurs, which 

is given by xf'' = — Pj’’^ and denoted as spreads. 

• For the nine previously dehned covariates, we also include exponentially weighted lagged 
versions. For example, in the case of the x[^'^ covariate, the respective lagged covariate value 
is then given by: 

4'^ = (17) 

n=l 

where for a time t, we consider w = 0.75 is the weighting factor, d = 5 is the number of lagged 
values we consider and A = Is is the interval between the lagged values. These covariates are 
hereafter denoted with the ‘1’ prehx. 

• The number of previous TED observations in the interval [t — d, t], with 6 = Is, denoted 
by prevexceed. 

• The time since the last exceedance, x^^^\ denoted by prevexceed. 

• The average of the last 5 TEDs, denoted by prevTEDavg. 

• The activity in the associated CAC40 index (in number of order additions, cancellations and 

/IQ) 

executions) in the previous second, x) , denoted by indact. 

• A dummy variable indicating if the exceedance occurred as a result of a market order to buy, 

denoted by mobuy. 

• A dummy variable indicating if the exceedance occurred as a result of a market order to sell, 

denoted by mosell. 

Altogether we then have 24 variates, 15 instantaneous and 9 lagged. 


5 Data description 


We use an 82 day trading sample (January 2nd to April 27, 2012) of all order submissions, executions 
and cancellations in the limit order book for 20 German and French stocks traded on Chi-X^'^ 


Information for these assets is provided in Table and they were to chosen include both small and 
large cap stocks in a number of different industries. We note that while the trading hours for Chi-X 


^°Chi-X was a pan-European multilateral trading facility (MTF) which merged with BATS in 2012. For the 
period under consideration, it accounted for between a quarter and a third of total trading activity in the French 
and German stocks considered, for more details see http://www.liquidmetrix.com/LiquidMetrix/Battlemap, 
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are 08:00 to 16:30, we do not consider activity before 08:01 and after 16:29, in order to avoid market 
opening and closing effects. 

There is a degree of flexibility regarding the manner in which one selects the threshold levels 
over which to consider exceedances. Panayi et ah, 2014 suggest that these may be specihed based 


on an interest in particular liquidity resilience scenarios to be studied, in other cases they can be 
specihed based on historical observations of the empirical distribution of the selected proxy for a 
liquidity measure. The implications of each choice are discussed in Panayi et ah, 2014 . For this 


paper we consider exceedances over the median threshold level, which could be of interest in the 
market making setting described in this paper, as well as exceedances over the daily 95% threshold 

level. 


6 Results and Discussion 

In the following model selection procedure we assess both the appropriateness of the various dis¬ 
tributional assumptions one might make for the TED response, as well as the importance of the 
different covariates in explaining the variation in the TED, in the interest of obtaining a parsimo¬ 
nious model. Panayi et ah 2014 explained that the assumption of stationarity in liquidity resilience 


over an extended period is not supported by the data, and for this reason we also £t the model 
individually for each day and each asset, where daily local stationarity is reasonably assumed. We 
will first identify the covariates that are most frequently found to be significant in daily regressions 
for different assets and for most of the period under consideration, in order to obtain a parsimonious 
covariate subset. We will then proceed by comparing the explanatory power of the regression model 
for lognormal, Weibull, gamma and Generalised gamma distributional assumptions for the response 
random variable. 


6.1 Model selection - covariate significance 

For the empirical evaluation of the importance of the various LOB covariates in the regression, 
we selected a lognormal model specification. This is because the simple linear formulation of 
this model enables us to use existing model selection techniques, in order to identify the model 
structure that produces the highest explanatory power for each daily regression for every asset. We 
evaluate the explanatory power of this model in terms of the proportion of the variation in the 
TED resilience measure that can be explained by the selected model covariates, as captured by the 
adjusted coefficient of determination (adjusted R^). 

are 


We first fit a multiple regression model, in which all covariates explained in Section 4.4.1 


considered in each daily model for the entire 4 months of our dataset, for each asset under consider¬ 
ation. We also considered interactions between covariates, but these were not found to be significant 
in the vast majority of cases. Figure shows the adjusted values obtained from fitting the full 
model, using as a threshold either the median or the 95th percentile spread, obtained every day. We 
find that for the vast majority of the stocks, the median adjusted R^ value is over 10%. For some 
stocks we find even more remarkable median adjusted R^ values of over 20%, rising to as much as 
50% for some daily models. 

We then used the branch-and-bound algorithm implemented in the the leaps package in R 
Lumley, 2004 , in order to identify the best scoring model (in terms of the adjusted R^ value). 


In this context, a model subspace is the set of all possible models containing a particular number 
of covariates v from the LOB. For example, the full model contains all covariates and is the only 
model in its subspace, while the smallest model subspace is comprised of models that contain the 
intercept and any one of the possible covariates. Intermediate model subspaces are comprised of 
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Figure 3: Boxplots of the adjusted value obtained from fitting the full regression model separately 
for each day in our dataset for both the threshold corresponding to the 5th decile of the spread (the 
median - red) and the 95th percentile threshold (blue). 


models with all combinations of n = 2 ... n — 1 covariates, where n = p + m is the total number of 
covariates, contemporaneous p plus lagged m variates. There are thus models in each model 

subspace. 

To illustrate our hndings we hrst present results for a given day of data for Credit Agricole in 
Figure]^ where for all model subspaces, we rank the models in the subspace based on their adjusted 

score. We thus obtain the best combination of covariates, for each subspace and for each day 
of data. We can then identify the covariates that are consistently present as we move between 
model subspaces. This is interesting because it gives us a relative measure of the contribution 
of that covariate across different assumptions of parsimony for the model. Particularly for higher 
dimensional model subspaces, some of the covariates in each subset model are not signihcant, and 
we distinguish between the covariates that are signihcant or not, at the 5% level of signihcance. 

To get an indication of the time stability of these model structures (and identify covariates that 
are consistently selected in the model), we illustrate the relative frequency with which parameters 
appear in the best models of every subset. That is, for each model subspace, we count the number of 
times each covariate forms part of the model with the highest adjusted-value over the four month 
period. Figure indicates that the covariates identihed earlier as being important in explaining 
the variation in the TED for a single day {prevTEDavg and spreads) are also consistent features 
in models across time. However, prevexceed does not form part of the best model very frequently, 
except in higher model subspaces, possibly because it is less informative in the presence of the 
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aforementioned covariates. 

Besides the frequency of the presence of each covariate in the best htting model of a given 
subspace, we also evaluate individual covariate signihcance over time via a formal partial t-test at 
the 5% level in Figure At higher model subspaces, we hnd that several covariates are found to 
be statistically signihcant (i.e. reject a null hypothesis for a partial t-test) less frequently. This is 
what one may expect, when covariates become less signihcant in the presence of other correlated 
covariates, i.e. collinearity in the factors of the LOB covariates takes effect. To validate this 
hypothesis regarding the model structures we develop, we have performed further analysis on the 
correlation between the covariates and the effect on our estimated coefficients, which can be provided 
on request. 

In this analysis, we recall that under our regression framework, the sign of the coefficient for 
a given covariate indicates the direction of the partial effect of this variable, on the conditional 
probability that the resilience, as measured by the exceedance duration for a given threshold, will 
exceed a time t. Therefore we can interpret positive coefficient values as influencing the liquidity 
resilience of the LOB by slowing the return to a desirable level, whilst negative coefficients tend to 
result in a rapid return to the considered liquidity level, indicating higher resilience marginally, with 
respect to that covariate. Panayi et ah 2014 provides an economic/theoretic interpretation of the 
signihcant covariates for the case of the lognormal model, and we provide in Section 6.5 a discussion 
regarding the sign and variation in coefficients for the different model structures we considered. 


% signihcant % positive 


askAge 

48.2 

22.4 

askModihed 

62.0 

50.2 

askVolume 

52.3 

27.6 

bid 

58.2 

34.4 

bidAge 

51.2 

23.2 

bidModihed 

59.5 

46.6 

bidVolume 

54.6 

27.7 

indact 

52.6 

35.6 

lask 

61.0 

12.1 

laskAge 

49.0 

22.3 

laskModihed 

65.6 

7.9 

laskVolume 

53.0 

29.7 

Ibid 

63.5 

12.4 

IbidAge 

45.5 

20.7 

IbidModihed 

63.7 

9.3 

IbidVolume 

53.1 

30.0 

Ispreads 

71.6 

58.9 

mobuy 

84.3 

36.5 

mosell 

83.5 

35.7 

prevTEDavg 

96.3 

96.2 

prevexceed 

69.9 

7.3 

spreads 

75.2 

70.5 

timelast 

57.6 

8.5 


Table 2: The percentage of daily models (for all assets) for which each covariate is found to be 
signihcant at the 5% level, and the percentage of daily models for which the sign of the associated 
coefficient is positive. 
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In Table [671] we summarise the significance and sign of the assets over time and over the different 
assets in our dataset. This is interesting as we can identify the regressors for which loadings are 
positive (negative) and thus produce marginal increases (decreases) in the expected TED under the 
model, and thus an associated decrease (increase) in the resilience of market liquidity. 

In the following, we will now consider a fixed subset of covariates in the regression models, 
which we identified in our previous model selection procedure as being most significant in the daily 
regressions, as well a consistent sign. These are: 


• prevTEDavg 

• spreads 

• prevexceed 

• mobuy, mosell 

• ask, bid 

• lask, Ibid 


6.2 Model selection - distributional assumptions 

We now consider the effect of different distributional assumptions on the explanatory power of the 
model, using the fixed subset of covariates selected above. We will compare the explanatory power 
of the lognormal, Weibull, gamma and Generalised gamma regression models. We will first relate 
the covariates to the mean of the response, as in the GLM structure, before considering separate 
link functions for further distribution parameters, as in the GAMLSS structure. 


Distribution 

Link function 

/i a v 

Lognormal 

identity 

log 

- 

Gamma 

log 

log 

- 

Weibull 

log 

log 

- 

Generalised Gamma 

log 

log 

identity 


Table 3: The link functions in the GAMLSS framework for each parameter for the four distributions 
under consideration. 


Figure shows the range of adjusted-values obtained from daily fits of each model over the 4- 
month period for the regression models with the different distributional assumptions. We note that 
in general, making lognormal and Weibull distributional assumptions leads to regression models 
where the explanatory power is comparable, whereas the explanatory power of gamma regression 
models is lower for the vast majority of assets. This indicates that the tail behaviour of the liquidity 
resilience measure tends to be better fit with moderate to heavy tailed distributions which admit 
more flexible skew and kurtosis features. 

We present in Figure for two assets, the estimated deviance of the fitted models every day, 
for all 4 distributional assumptions that we make. We see that in general, the Generalised Gamma 
produces model fits with the lowest deviance values, which is as one would expect, as it encompasses 
all the other distributions as special limiting cases. We also observed that there are a few days where 
the Generalised Gamma regression model failed to converge, and in these cases the Lognormal 
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dominates. Over our dataset, we find that the Generalised Gamma model is the best performing 
model for approximately 75% of the data, while for most of the remaining cases (which would mostly 
include daily datasets where the Generalised Gamma model failed to converge), the best model is 
the Lognormal one. 


Lowest deviance 

Lognormal 

24.6% 

Gamma 

0 .0% 

Weibull 

0 .8% 

Generalised Gamma 

74.6% 


Table 4: The percentage of daily datasets for which each model fit produced the lowest deviance. 


6.3 Incorporating more flexibility through a GAMLSS framework spec- 
iflcation 

The advantage of the GAMLSS framework is that is one is able to relate regression covariates 
to every distribution parameter through different link functions. Within the gamlss package, the 
distributions are reparameterised (as explained in Section so that they have common parameters 
/X, cr, and possibly v and k. For the Lognormal model the default link function for /i is the identity 
link 


^(/i) = /i 
= 

while for most other parameters the log link is used, e.g. for a 


h{a) = log(a) 

= a:2/32, 

As we do not know apriori whether covariates are more important in affecting one distribution 
parameter than another, we use a common set of covariates for each link function and therefore 
in the expressions for the link functions above Xi = X 2 = X. We present in Figure the 
explanatory power of the Lognormal regression model when considering only a single link function 
for /i, and when considering an additional link function for a also. We note that there is a slight 
increase in the median and this is observed across the assets in our dataset. 

6.4 Effects of unit changes in LOB dynamics on the TED 

In this section we consider how to study the influence on the TED arising from a unit change in the 
statistically most important covariates given by: prevTEDavg] spreads; prevexceed; mobuy, mosell; 
ask, bid; lask, Ibid. This is interesting to study as it will depend on the distributional choice and 
model structure. We study the perturbation effect of a unit change of one covariate in the GAMLSS 
model, given all the other covariates on the mean and variance functions of the model. This will 
allow us to interpret the influence of the sign and magnitude of the coefficient loadings in the model 
for each covariate on the average TED (replenishment time) and the variance in the TED for an 
asset. 
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6.4.1 Lognormal GAMLSS mean and variance functions in the GAMLSS framework 

In the case of the Lognormal model with a single link fnnction on g = jS'^x we know that 

the mean and variance fnnctions are given as follows: 


E [r| x] = exp [ii{x) + o-^/2) 

Var [r| x] = exp (o-^/2) [exp (o-^/2) — l] exp {2g,{x)). 


(18) 


We can therefore consider the influence of a unit change in a covariate under this model by con¬ 
sidering the partial difference in the mean and variance functions given a change in say the j-th 
covariate, which is given by 


A 

dx. 


E [r| x] = I3j 


_d_ 

dx, 


Var [r| x] = 2/3^ exp ( 0 -^/ 2 ) [exp (o'^/2) — l] exp {2g,{x)). 


(19) 


From this analysis one sees that a unit change in the j-th covariate Xj with a negative coefficient 
loading will produce an increase in the mean liquidity resilience by reducing the average TED. 
Conversely, a positive loading will result in an decrease in the mean liquidity resilience. 

In the case of a Lognormal model with two link functions, assuming both parameters are related 
to the same set of covariates in vector x, then one has g {fi{x)) = fS'^x and h (o'(a;)) = log (cr(a;)) = 
cx^x. An approximation of the log link function 


InE [r| x] 




( 20 ) 


results in the following approximate relationship for the partial derivative of a covariate Xj 

E [r| x] Tn [j3j -f- aj) exp -|- E PjXj + ^ . 


( 21 ) 


Hence, if /3j + aj > 0 then a unit change in covariate xj will result in an increase in the average 
TED. Conversely if Pj + aj < 0 then an increase in covariate Xj will decrease the average TED and 
in the third case that Pj + aj = 0, changes in the covariate have no effect on liquidity resilience, as 
measured by the TED. 


6.4.2 Gamma mean and variance functions in the GAMLSS framework 

In the case of the Gamma model with a link function on g {fi{x)) = jS'^x, the mean and variance 
function are given as follows: 


E [r| x] = g,{x) = exp 

^2 PjXj + 2 O'jX^ ■ 


Var [r| x] = fi [x) a [x) = exp 


We can therefore consider the influence of a unit change in a covariate under this model by con¬ 
sidering the partial difference in the mean and variance functions given a change in say the j-th 
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covariate given by 


—E [r| a;] = I3j exp ( ^ f3jXj 
^ \ j 

^Var [r| x] = 2 + aj) exp ^ 

We see from this analysis that a nnit change in the variable Xj when /3j > 0 resnlts in an increase 
in the average TED, as well as an increase in the variance in the variance of the TED (3j + aj > 0. 

6.4.3 Weibull and Generalised Gamma mean and variance functions in the GAMLSS 
framework 

The Weibnll distribution parameterised has mean and variance functions 



E [t\x] = fi{x) = exp j , 


V [r|a;] = /i^(a;) < T ( + 1 


a[x 


T 


aix 


+ 1 


-2 


- 1 


The generalised gamma distribution has mean and variance functions 


E [r|a;] = fj.{x 


V [t\x] = iJ?{x) 


T{e + l) 

r(9)r (9 + i)-[r(9+i)]'' 


ei ir(»)]- 


We can obtain the partial difference in the mean and variance functions as in the lognormal 
and Gamma cases above. However, the presence of the gamma functions makes the identihcation 
of the partial contributions of unit changes in covariates to the mean and variance functions more 
involved. 


6.5 Interpretation 


Since we have obtained model hts for every model subspace, and for every day in our dataset, we 
can investigate the inter-day variation of the coefficients, as well as their magnitude and sign over 
time. In Figure we summarise these results for the best htting model on each day. The plots 


demonstrate for each model distribution assumption the following features: 1) the variation in each 
coefficient in the link functions for /i and a and 2) the coefficient sign, and thus its interpretation 
with regards to how it influences these parameters, generally related to the resilience mean and 
variance. 

We note that the signs of the coefficients generally agree for the models under the different 
distributional assumptions. In particular, the prevTEDavg covariate, which is an average of the last 
5 log TED observations, and generally has a positive coefficient, is thus associated with a slower 
return to the threshold liquidity level. Thus, our model indicates that the expected TED over a 
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particular threshold will be larger, when the duration of similar exceedances in the near past has 
been longer. We also hnd that the instantaneous spread covariate (i.e. the value of the spread at 
the moment when it hrst exceeds the threshold) appears frequently in the best model and has a 
positive coefficient (and would also increase the expected TED). This results matches our intuition, 
as the wider the spread just after an event at time Tj+, the longer we would expect the spread 
exceedance to last, on average. 

Of particular interest are the mobuy and mosell covariates, i.e. dummy variables indicating 
whether the exceedance resulted from a buy or sell market order respectively (if both are zero, 
then the exceedance was a result of a cancellation). For the majority of assets, such as Deutsche 
Telekom, for which results are presented in Figure 10, the coefficients are generally found to be 


negative, indicating that exceedances from market orders are associated with an decrease in the 
expected TED, compared to cancellations. For a small number of assets, such as Credit Agricole 
we have noted that the opposite effect is found. 

7 Liquidity drought extremes 

We present here an application of the model as a regulatory tool for the monitoring of liquidity. 
Similar to the previous setting, we would expect that regulatory bodies are interested in ensuring 
uninterrupted liquidity, as it is an integral part of a fair and orderly market. However, they would 
probably focus on the extreme liquidity levels that occur, and the durations of these extreme events. 

For this application, instead of the conditional mean response of the observation variable, we 
now consider conditional quantiles of the response. That is, if a TED event occurs in a (stationary) 
LOB regime, given covariates x, we can make a prediction about the (1 — a)-th quantile, eg. the 
90th quantile of the response: This is the duration of time such that there is a 90% probability 
under the model that liquidity will return to the threshold level in this period. To understand the 
use of the model we developed in the manner of a quantile relationship, we explain briefly how 
to reinterpret the GAMLSS regression model, specihcally in the case of the generalised gamma 
distribution family we adopt in this paper, as a quantile regression model structure, following the 


developments discussed in Noufaily and Jones 2013 


In general when performing a quantile regression study, where one links the quantile behaviour 
of an observed response variable, in our case the TED random variables for liquidity resilience, to a 
set of covariates, it is achieved by either adopting a non-parametric or a parametric framework. The 
most common approach is to consider the non-parametric quantile regression approach, where one 
estimates regression coefficients without making assumptions on the distribution of the response, 
or equivalently the residuals. If Tj > 0 is a set of observations and Xi = (1, • • •, Xim) is a vector 

of covariates that describe Tj, the quantile function for the log transformed data 1^* = InTj G is 


Qy*(M|3Jj) T ^ ^ (^k,u Xjk 


( 22 ) 


k=l 


where u G (0,1) is the quantile level, = (tto,u) • • • > «fe,u) are the linear model coefficients for 
quantile level u which are estimated by solving 


min pu{ei) ='^ ei[u - I{ei < 0)] 




(23) 


i<I 


i<I 


where e* = y* - q;o,„ 
exp{QY*{u\xi)). 


m 

C(k,uXik- Then the quantile function for the original data is Qy^ulxi) = 

k=l 
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It was realised by Koenker and Machado 1999 and Yn and Moyeed 2001 that the parameter 
estimates of obtained by minimizing the loss fnnction in (23) will be eqnivalent to the maximnm 


likelihood estimates of when Y* follows the Asymetric Laplace (AL) proxy distribntion with 


pdf: 




p{i-p) 


CTi 


exp 


iy*i - K) 

(Ti 


\p-i{y*i < Pi)] 


(24) 


where the location parameter or mode p* eqnals to QY*{u\xi) in (22), the scale parameter cTj > 0 
and the skewness parameter p G (0,1) eqnals to the qnantile level u. Since the pdf (24) contains 


the loss fnnction (23), it is clear that parameter estimates which maximize (24) will minimize (23) 


In this formnlation the AL distribntion represents the conditional distribntion of the observed 
dependent variables (responses) given the covariates. More precisely, the location parameter pi of 
the AL distribntion links the coefficient vector and associated independent variable covariates 
in the linear regression model to the location of the AL distribntion. It is also worth noting that 
nnder this representation it is straightforward to extend the qnantile regression model to allow for 
heteroscedasticity in the response which may vary as a fnnction of the qnantile level u nnder stndy. 
To achieve this, one can simply add a regression strnctnre linked to the scale parameter ai in the 
same manner as was done for the location parameter. 

Eqnivalently, we assnme Y* conditionally follows an AL distribntion denoted by Y* 

Then 

Y: = p:+e:ai ( 25 ) 


where e* ~ AL(0, 1,m), p* = ao,u + Z) (^k,uXik, erf = exp(/?o,« + Z f^k,uSik) and Sik are covariates 

k=l k=l 

in the variance fnnction. 

One conld indeed consider the resnlting ALD model as a GAMLSS model strnctnre which is 
interpreted as a qnantile regression model also. However, there is another snb-class of models for 
which one can develop a GAMLSS model that will also be associated with a qnantile regression 
strnctnre, not necessarily in the ALD family. In this paper we consider again the generalised gamma 
distribntion family of GAMLSS strnctnres and we observe that one can obtain the qnantile fnnction 
of this family of models in closed form, which is again a form of qnantile regession since it directly 
relates the qnantile fnnction of the TED response to the covariates. 

In particnlar if we consider that the TED responses are modelled according to the GAMLSS 
regression strnctnre nnder one of the available parameterizations disenssed previonsly, snch as the 
Generalised Gamma distribntion 


^ j.bk-1 

fr{T]h{x),a{x),k{x)) = f^^^exp 


j j ,fc>0,a>0,6>0 


(26) 


where we note that the parameters b, a, k can be made to be fnnetions of the covariates x nnder 
the GAMLSS strnctnre. From this regression relationship, one can obtain the conditional qnantile 
fnnction for a given qnantile level u, as determined by Nonfaily and Jones 2013 . Obtaining this 


qnantile fnnction for the conditional response given the covariates, linked to the response throngh the 
parameters, is achieved by representing the log generalized gamma distribntion’s qnantile fnnction in 
terms of a base qnantile fnnction, in this case given by a gamma distribntion with specihcally selected 
shape and scale parameters. The reqnired transformation of the analytic closed-form qnantile 
fnnction of a Gamma random variable, denoted by G~^, with shape u and scale k{x), then gives 
the conditional expression 


Q ( m ; x) = a{x) 


k{x] 


G {u-,u,k{x)) 


1 

b{x) 


(27) 
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where a{x), and we recall that this parameterisation corresponds to that of the gamlss package with 
b = u, a = and k = 9, with 9 = and where one or more of /i, a, u may be fnnctions of x 

since one or more of the parameters k,b,a can be made fnnctions of the covariates x. 

To start with, we obtain the conditional qnantile level of the TED, assnming that covariates take 
median intra-day valnes. The TED is dehned as before, nsing the spread as the liqnidity measnre 
and the median of the empirical distribntion as the threshold. We then allow covariate prevTEDavg 
to vary within a range of typical intra-day valnes and obtain the conditional qnantile levels for the 
fonr distribntions we have considered in FignrepTj We note here that these qnantile fnnctions have 
been obtained with model parameters from a GAMLSS strnctnre, where we only considered the 
link fnnction pertaining to the hrst parameter. We can see that this strnctnre separates the effects 
of covariates and qnantile levels on the qnantile of the TED, as covariates only enter into parameter 
a above. 

Fignresjl^ and show the qnantile snrface obtained when we allow two covariates to vary. We 
note from Fignre [T^ that when both covariates prevTEDavg and spreads take extreme valnes, this 
leads to a vast increase in the median TED level nnder onr model for all distribntional assnmptions. 

Snch an analysis is nsefnl in nnderstanding how extreme levels of particnlar covariates affect 
qnantile levels of the TED. We can therefore nnderstand how the different qnantile snrfaces for the 
TED behave for these extreme valnes of the spread and above. This enables regnlators to identify 
which are the most important covariates associated with an increase in extreme periods of illiqnidity 
(that is, where the liqnidity measnre remains above the threshold for extended period of time). In 
addition, to the extent that a covariate taking extreme valnes is considered a scenario in which the 
LOB is stressed, a regnlator can make inferences abont the dnration of relative illiqnidity nnder 
snch stressed conditions. 

In non-stressed conditions, that is, where covariates take what would be considered to be ‘normal’ 
values, regulators may be interested in the range of probable values of the TED. Obtaining high 
quantile levels of the TED under the model could then help them identify situations which fall 
outside this range, which may be due to a change in the LOB regime or due to a particular event 
that will require their intervention. 


8 Proposals and conclusion 


Given the intra-day variation in liquidity demand, we propose that quoting requirements of Desig¬ 
nated Market Makers / Designated Sponsors be amended so as to include a provision for liquidity 
replenishment after a shock. We have shown that the Threshold Exceedance Duration (TED)(Panayi 


et ah 2014 is a good metric for the speed of this liquidity replenishment, and indeed it has been 
dehned so as to be able to incorporate any liquidity measure (e.g. spread, XLM) an exchange may 
be interested in, and any liquidity threshold that would indicate that there is sufficient liquidity in 
the asset. We do not suggest an explicit target for the TED for each asset, but we suggest that 
this should vary according to the asset’s liquidity, as do current quoting requirements regarding 
maximum spreads and minimum posted volumes. 

We have presented a comprehensive study of different regression structures that could be used to 
model the variation in the TED. An appropriate modelling structure would be invaluable for both 
the operators of the exchange and the market makers who are subject to the quoting requirements. 
The former, because they could use it to determine an appropriate target level of the TED, given 
the prevailing conditions. The latter, because when they act to replenish liquidity, it is possible 
that they have several options about the way in which they do it, and the model could prescribe 
the method that would most improve resilience in market liquidity. 

In our modelling we employed various regression structures, starting from simple log-linear 
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models to Generalised Linear Models (GLMs) and GAMLSS models. Under these approaches, we 
compared the explanatory power of Lognormal, Gamma, Weibull and Generalised Gamma models. 
We also evaluated the additional explanatory power of allowing covariates to affect each of the 
parameters of the distribution, as in the GAMLSS structure. 

We determined that while the Generalised Gamma model had, in most cases, the highest ex¬ 
planatory power of the distributions considered, its advantage over the Lognormal distribution was 
minimal at best. In addition, considering also a link function for a second parameter in the model 
increased explanatory power, but again the increase was perhaps not sufficient to justify the more 
involved GAMLSS modelling approach. Summarising, a simple log-linear structure is, in our opin¬ 
ion, the recommended approach to modelling the TED, as its estimation is very robust and its 
explanatory power similar to much more flexible models, such as the Generalised Gamma model. 


Name 

Symbol 

Gountry 

Sector 

GREDIT AGRIGOLE 

AGAp 

FRANGE 

Banking Services 

ALLIANZ 

ALVd 

GERMANY 

Insurance 

BAYER 

BAYNd 

GERMANY 

Biotechnology / Pharmaceuticals 

BIG 

BBp 

FRANGE 

Gommercial Services / Supplies 

BMW 

BMWd 

GERMANY 

Automobiles / Auto Parts 

DANONE 

BNp 

FRANGE 

Food / Tobacco 

AXA 

GSp 

FRANGE 

Insurance 

DAIMLER 

DAId 

GERMANY 

Automobiles / Auto Parts 

DEUTSGHE BANK 

DBKd 

GERMANY 

Banking Services 

JGDEGAUX 

DEGp 

FRANGE 

Media / Publishing 

DEUTSGHE TELEKOM 

DTEd 

GERMANY 

Telecommunications Services 

GROUPE EUROTUNNEL 

GETp 

FRANGE 

Rails / Roads Transportation 

PUMA 

PUMd 

GERMANY 

Textiles / Apparel 

HERMES INTL. 

RMSp 

FRANGE 

Textiles / Apparel 

RENAULT 

RNOp 

FRANGE 

Automobiles / Auto Parts 

SKY DEUTSGHLAND 

SKYDd 

GERMANY 

Media / Publishing 

AXEL SPRINGER 

SPRd 

GERMANY 

Media / Publishing 

TUI 

TUIld 

GERMANY 

Hotels / Entertainment Services 

UBISOFT ENTM. 

UBIp 

FRANGE 

Leisure Products 

VOLKSWAGEN 

VOWd 

GERMANY 

Automobiles / Auto Parts 


Table 5: Information about the 20 European stocks used in the study. 
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Figure 4: The adjusted values for models of using the best subsets of covariates (of size 1 to 
24, in this case) for a single trading day (the 17th of January 2012 ) for stock Credit Agricole in 
the lognormal speciheation and the median spread (top) or the 95th percentile (bottom) as the 
threshold. Any one row corresponds to a submodel with the highlighted squares indicating whether 
a variate has been included in the model or not. A dark square indicates statistical significance 
at the 5% level, with light squares not statistically significant at the 5% level. For instance, row 
M 3 corresponds to a specification with the following variates: intercept, ask, prevTEDavg and 
spreads. The models are ranked by the best adjusted value, and we see that in this case, the 
best scoring model is obtained using a subset ofolS covariates for the top plot. We differentiate 

































Figure 5: Heatmap of the relative frequency with which parameters appear in the best daily models 
of every subspace (frequency in terms of the number of daily models over the 82 day period) for the 
Credit Agricole dataset using the daily median (left) or the 95th percentile (right) of the spread as 
the threshold value. So for instance, the element in row Mil and column lask indicates the relative 
frequency (in terms of the fraction of days over the 82 day period) by which the variate lask has 
appeared in the best model with 10 variates amongst all models with 10 variates. The bottom row 
is not informative since by construction all variates appeared amongst the best model having all 
variates. 




Figure 6: Heatmap of the relative frequency with which the parameters are found to be signihcant 
at the 5% level (frequency in terms of the number of daily models over the 82 day period) for the 
Credit Agricole dataset, using the median and 95th percentile thresholds. 
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Figure 7: Boxplots of the adjusted value obtained from fitting the regression models with 
the various distributional assumptions separately for each day in our dataset for the threshold 
corresponding to the 5th decile of the spread 
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Figure 8: The deviance for every distributional assumption, with a model £t £t every day over an 
81-day trading period. 
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UBIp TUMd 

Figure 9: Boxplots of the adjusted value obtained from fitting the lognormal regression model, 
comparing the explanatory power when relating only a single distribution parameter /i to covariates, 
and when relating both p and a in a GAMLSS framework, for four different assets. (Top left): Credit 
Agricole SA. (Top right): Bayer AG. (Lower left): UBISOFT Entertainment. (Lower right): TUI 
AG. 
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Mu intercept: (min,max,q25,q50,q75)=(6.7,8.2,7.4,7.6,7.8) 
Sigma intercept: (min,max,q25,q50,q75)=(0.7,1.6,0.8,0.8,0.9) 
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Mu intercept: (min,max,q25,q50,q75)=(6.4,8.6,7.5,7.9,8.1) 
iigma intercept: (min,max,q25,q50,q75)=(-1.4,-1.1 ,-1.3,-1.2,-1.2) 
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Figure 10: Boxplots of the model coefficients obtained from fitting the regression models with 
the various distributional assumptions separately for each day in our dataset for the threshold 
corresponding to the median spread for stock Deutsche Telecom (stock symbol DTEd). (Top): 
Lognormal. (Middle): Gamma. (Bottom): Weibull. Left: Considering link function for /i only. 
Right: Considering link function for /i and a. 
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Figure 11: Quantile plots: Upper left: Lognormal. Upper right: Gamma. 
Lower right: Generalised gamma. 


Lower left: Weibull. 
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Figure 12: Quantile plots for lognormal,gamma,weibull and generalised gamma when varying 2 
covariates. 
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Figure 13: Quantile plots for lognormal,gamma,weibull and generalised gamma when varying 2 
covariates. 
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